SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
نویسندگان
چکیده
Visual question answering on document images that contain textual, visual, and layout information, called VQA, has received much attention recently. Although many datasets have been proposed for developing VQA systems, most of the existing focus understanding content relationships within a single image not across multiple images. In this study, we propose new multi-image dataset, SlideVQA, containing 2.6k+ slide decks composed 52k+ 14.5k questions about deck. SlideVQA requires complex reasoning, including single-hop, multi-hop, numerical also provides annotated arithmetic expressions answers enhancing ability reasoning. Moreover, developed end-to-end model treats evidence selection as unified sequence-to-sequence format. Experiments show our outperformed state-of-the-art QA models, but it still large gap behind human performance. We believe dataset will facilitate research VQA.
منابع مشابه
Question Answering on the SQuAD Dataset
We develop a deep learning framework for question answering on the Stanford Question Answering Dataset (SQuAD), blending ideas from existing state-of-theart models to achieve results that surpass the original logistic regression baselines. Using a dynamic coattention encoder and an LSTM decoder, we achieved an F1 score of 55.9% on the hidden SQuAD test set. In this paper, we present the methodo...
متن کاملSQuAD Question Answering Dataset: CS224N Assn 4
We solve the contextual question answering problem, which is an essential part in many automated question-answering datasets. Recently the SQuAD dataset [1] was uploaded and there were several deep learning approaches proposed to solve this. We implement a modified version of one of them, the Dynamic Coattention model as well as simple baseline.
متن کاملFrom Document Retrieval to Question Answering From Document Retrieval to Question Answering
متن کامل
Solving the Prerequisites: Improving Question Answering on the bAbI Dataset
The aim of this project is to make progress towards building a machine learning agent that understands natural language and can perform basic reasoning. Towards this nebulous goal, we focus on question answering: Can an agent answer a query based on a given set of natural language facts? We combine LSTM sentence embedding models with an attention mechanism and obtain good results on the Faceboo...
متن کاملToward a Document Model for Question Answering Systems
The problem of acquiring valuable information from the large amounts available today in electronic media requires automated mechanisms more natural and efficient than those already existing. The trend in the evolution of information retrieval systems goes toward systems capable of answering specific questions formulated by the user in her/his language. The expected answers from such systems are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26598